Back

European Heart Journal - Digital Health

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match European Heart Journal - Digital Health's content profile, based on 15 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv
Top 0.1%
26.8%
Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [&le;] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [&le;] 40%. After fine-tuning on less than 10% of external data, LVEF [&le;] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

2
Enhanced Demographically Adaptive QT Correction Improves Pediatric Screening for Congenital Long QT Syndrome

Haq, K.; Berul, C.; Posnack, N.

2026-05-19 cardiovascular medicine 10.64898/2026.05.14.26353243 medRxiv
Top 0.1%
17.0%
Show abstract

Background: Traditional heart rate (HR) adjusted QT correction (QTc) formulae often fail to eliminate the inverse HR-QT interval relationship, particularly in pediatric patients. In this study, we optimized our previously published adaptive QTc (QTcAd) formula by including additional demographic variables and broadening the pediatric age range. We tested the hypothesis that QTcAd improves congenital long QT syndrome (congenital LQTS) detection performance and reduces erroneous classifications across pediatric cohorts. Methods: We retrospectively analyzed 8,306 ECGs from 4,556 cardiovascular disease (CVD)-free pediatric patients. For neonatal patients (1-30 days old), we derived daily QTcAd parameter values. For older patients, we developed regression models to estimate QTcAd parameters (mean Heart Rate (HR) = -15.9ln(days) + 219; |m| = 0.0001(days) + 1, where |m|=absolute HR-QT regression slope). To support LQTS screening, we constructed dynamic QTcAd thresholds by estimating age-specific reference limits. Diagnostic performance was tested in a clinically confirmed LQTS cohort (n=137), and further evaluated in the Pediatric Heart Network (PHN; n=2,394) and Emergency Department (ED; n=2,002) cohorts. Results: Using the confirmed LQTS cohort as the event population and the CVD-free cohort as the non-event population, QTcAd demonstrated higher sensitivity than QTcB (92% vs 46.7%). QTcAd maintained high specificity (96.9% vs 98.9%), which resulted in a higher Youden index (0.889 vs 0.456). In the PHN healthy cohort, both QTc formulae classified the majority of individuals as normal (QTcAd 95%; QTcB 98.2%) indicating few false-positives. In the ED cohort, QTcAd reduced borderline/prolonged QTc classifications requiring follow-up, yielding 270 fewer repeat-testing triggers than QTcB. We developed a publicly accessible calculator to compute QTcAd and classify congenital LQTS risk. Conclusion: We developed and validated an enhanced QTcAd formula for pediatric patients. QTcAd-based-age-adjusted dynamic thresholding improved performance for congenital LQTS screening, while maintaining high specificity. This reduces false-positive LQTS classifications and repeat ECGs, thereby decreasing unnecessary downstream clinical evaluation.

3
Does ECG-Based AI Detect Aortic Stenosis Beyond Conventional LVH Criteria? An Analysis of the CLIDAS Database

Shimada, T.; Kodera, S.; Sawano, S.; Guan, J.; Saitoh, W.; Wakasa, S.; Ito, S.; Yanagishita, T.; Hayashi, Y.; Shibata, A.; Ito, A.; Otsuka, K.; Higashikuni, Y.; Okamura, H.; Tsujita, K.; Node, K.; Yamaguchi, O.; Makimoto, H.; Kabutoya, T.; Imai, Y.; Nakayama, M.; Sato, H.; Fujita, H.; Kohro, T.; Matoba, T.; Takeda, N.; Fukuda, D.; Nagai, R.

2026-06-08 cardiovascular medicine 10.64898/2026.06.07.26355087 medRxiv
Top 0.1%
14.3%
Show abstract

Background: Aortic stenosis (AS) is a progressive valvular disease associated with poor prognosis once symptoms develop, yet routine echocardiographic screening is impractical. While artificial intelligence (AI)-based electrocardiogram (ECG) models have shown promise for AS detection, it remains unclear whether they primarily reflect conventional left ventricular hypertrophy (LVH) voltage criteria or capture additional ECG features. Methods and Results: We developed a deep learning model using 244,816 ECGs from 51,713 patients across six academic institutions in Japan (CLIDAS database). AS labels were derived from inpatient Diagnosis Procedure Combination (DPC) codes. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.849 (95% confidence interval 0.832-0.865) in the independent test cohort, with consistent performance across institutions, sex, and age. At a threshold of 0.1, sensitivity was 79.1%, specificity was 73.9%, and negative predictive value (NPV) was 98.0%. Conventional LVH voltage criteria (Sokolow-Lyon AUC 0.706; Cornell AUC 0.692) showed lower performance, and adding them to the AI model conferred no incremental benefit (AUC 0.849 vs. 0.847). Gradient-weighted class activation mapping (Grad-CAM) revealed predominant attention around QRS complexes in limb leads, beyond regions typically assessed in LVH evaluation. Conclusions: This multicenter AI-ECG model demonstrated strong discrimination for AS and captured ECG features beyond conventional LVH voltage criteria. The high NPV supports its use as a rule-out pre-screening tool.

4
Beyond Agreement: a real-world study of the workflow gap between echocardiography and timely structural cardiac assessmentHow a Validation Study Exposed a Hidden Gap in Cardiac Care

Nogueira, M. A.; Ferreira, F. C.; Batista, E.; Eira, S.; Proenca, G.; Matias, C.; Kecskes, I.

2026-05-15 cardiovascular medicine 10.64898/2026.05.12.26352129 medRxiv
Top 0.1%
14.2%
Show abstract

Objectives To assess agreement between Cardio-HART (CHART) and echocardiography for left ventricular ejection fraction (LVEF) estimation and heart failure (HF) classification in a real-world predominantly ischaemic cohort, while examining whether a point-of-care structural and functional assessment tool could reveal a broader workflow gap between the nominal availability of echocardiography and timely cardiac assessment in routine care. Design Prospective single-centre cohort study. Setting Secondary-care cardiology service at Cascais Hospital, Lisbon, Portugal. Participants Forty-seven adults referred for cardiology evaluation with suspected HF or followed in a hospital HF clinic. Primary and secondary outcome measures Agreement between CHART-derived and echocardiographic LVEF by Bland-Altman analysis; diagnostic performance for HF phenotypes; comparison with the Teichholz method. Results Mean age was 65.6+-15.9 years; 78.7% of participants had HF and 43.2% of HF cases were ischaemic. CHART showed a mean LVEF bias of +1.92% versus echocardiography, with 95% limits of agreement from -14.6% to +18.4% and a mean absolute error of 6.09%. Agreement was strongest in HF with reduced ejection fraction (HFrEF) and HF with mildly reduced ejection fraction (HFmrEF), and lower in HF with preserved ejection fraction (HFpEF). Diagnostic area under the curve for HFrEF classification was 0.89. Compared with the Teichholz method, CHART showed a lower root mean square error relative to Simpson's biplane LVEF. Conclusions CHART showed clinically credible performance for LVEF estimation and HF stratification, particularly in reduced-EF phenotypes. However, the most important finding of this study was not agreement alone. By performing credibly in a cardiology-based real-world setting, CHART exposed a previously under-recognised workflow gap between the nominal availability of echocardiography and timely access to structural cardiac assessment in routine care. The study therefore suggests that the value of CHART lies not only in diagnostic performance, but in making visible, and potentially narrowing, a hidden but consequential gap in cardiac assessment pathways. Larger studies are warranted, particularly for HFpEF and across broader clinical workflows.

5
AutoClip: AI-Guided TEE Semantic Segmentation for TEER A Proof-of-Concept Study

Chen, M.; Li, X.; Yang, K.; Taramasso, M.

2026-06-06 cardiovascular medicine 10.64898/2026.05.29.26354195 medRxiv
Top 0.1%
13.9%
Show abstract

**Abstract** **Background:** Transcatheter edge-to-edge repair (TEER) is an established treatment for mitral regurgitation but remains highly dependent on operator experience and complex transesophageal echocardiography (TEE)-guided intraprocedural imaging. Artificial intelligence (AI)-based semantic segmentation may improve procedural reproducibility and intraprocedural guidance; however, no TEER-specific segmentation framework has been reported. **Objectives:** To develop and evaluate AutoClip, a clinician-driven AI-guided TEE semantic segmentation model designed for simultaneous delineation of mitral valve anatomy and in-vivo TEER device components. **Methods:** A retrospective proof-of-concept study was conducted using 987 intraprocedural TEE frames derived from 10 video clips in 3 patients undergoing MitraClip G4 implantation. Seven semantic labels, including mitral leaflets and device components, were manually annotated using ITK-SNAP. Following standardized preprocessing and region-of-interest extraction, an Attention U-Net architecture was trained frame-wise on bicommissural and corresponding X-plane TEE views. Model performance was assessed using mean intersection-over-union (IoU) and Dice coefficient on an independent test set. **Results:** The Attention U-Net demonstrated improved sensitivity to small device structures compared with conventional U-Net architectures. Preliminary training performance achieved a mean IoU of approximately 0.93, while independent test performance reached a mean IoU of 0.46 across foreground classes. Qualitative assessment demonstrated feasible simultaneous segmentation of mitral leaflets, clip arms, grippers, and delivery shaft during TEER procedures. **Conclusions:** AutoClip represents a proof-of-concept TEER-specific TEE semantic segmentation framework initiated through a clinician-oriented workflow without formal computer science expertise. Although preliminary accuracy remains modest due to limited sample size, this study establishes a reproducible pathway for future AI-assisted intraprocedural guidance systems and larger multicenter development efforts in structural heart interventions.

6
Deep learning optimisation for cardiology: Neural Architecture Search-driven arrhythmia classification with electrocardiograms

Vanegas Mueller, E.; Joe-Oshodi, A.; Banerjee, A.; Villarroel, M.

2026-05-30 cardiovascular medicine 10.64898/2026.05.28.26354348 medRxiv
Top 0.1%
12.8%
Show abstract

Cardiovascular disease is the leading cause of death worldwide. Sudden cardiac death (SCD) accounts for roughly 50% of all cardiac deaths. The electrocardiogram (ECG) is widely used for early diagnosis of cardiac disease. However, the complexity of accurate interpretation limits the ECG's efficacy. Modern deep learning methods have been applied to assist clinicians in diagnosis. We applied Neural Architecture Search (NAS), an automated machine learning technique, to identify optimal deep learning architectures for classifying cardiac arrhythmias from ECGs. We applied the Differentiable Architecture Search strategy to an AutoFormer search space to identify optimal self-attention architectures for arrhythmia classification. We trained, validated, and tested the resulting model on the PhysioNet Challenge 2021 dataset (n = 88,253), comprising ECGs across three continents. We performed a hyperparameter optimisation on the NAS output, exploring input patch size, class weighting, and loss function. We evaluated performance using the PhysioNet Challenge metric and the area under the receiver operating characteristic curve (AUROC). The NAS converged towards minimal architectural configurations (embedding dimension: 384, depth: 4, self-attention heads: 4, MLP ratio: 1) with a validation challenge metric of 0.66 (PhysioNet Challenge 21 Winner: 0.63). The NAS-created network achieved an AUROC of 0.97 and a challenge metric of 0.71 during testing. Normal Sinus Rhythm and Sinus Tachycardia achieved AUROCs of 0.99. Low-QRS Voltage and T-wave abnormality were the worst-performing arrhythmias, with AUROCs of 0.89 and 0.90, respectively. We interpret that architectural simplicity drives performance in arrhythmia classification. Because SCD is unexpected, prevention strategies in free-living environments require lightweight computational resources suitable for wearable devices. Class imbalance fundamentally limits classification performance for rare arrhythmias such as Low-QRS Voltage and T-wave inversion, irrespective of hyperparameter choices. However, the self-attention mechanism can autonomously abstract clinical representations, simplifying clinical deployment by eliminating the need for an explicit feature-extraction pipeline.

7
Biomarker Signal Architecture in Cardiovascular Machine Learning: Stability, Redundancy, and Minimal High-Yield Panels After Myocardial Infarction

Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.

2026-05-22 cardiovascular medicine 10.64898/2026.05.19.26353638 medRxiv
Top 0.1%
10.3%
Show abstract

Background: Machine-learning models based on circulating biomarkers are increasingly used in cardiovascular research; however, model performance alone provides limited insight into how the predictive signal is distributed across features. We aimed to characterize the biomarker signal architecture of a machine-learning model distinguishing ST-elevation myocardial infarction (STEMI) from non-ST-elevation myocardial infarction (NSTEMI), with a focus on signal concentration, redundancy, and conditional complementarity. Methods: We conducted a structured secondary analysis of a previously established, leakage-controlled machine-learning framework (n = 152 patients). The BIOMARKERS feature-set variant (10 biomarkers) was evaluated using outer-fold cross-validation. Model structure was interrogated using (i) leave-one-biomarker-out analysis, (ii) pairwise leave-two-out analysis with pair-excess estimation, (iii) cumulative ablation of top-ranked biomarkers, and (iv) forward reconstruction of minimal biomarker panels. Uncertainty was assessed using bootstrap resampling across folds. Results: The full biomarker model achieved a mean ROC-AUC approaching 0.94. The predictive signal was highly non-uniform, with MMP-2 showing the largest single-feature contribution (mean {Delta}AUC {approx} 0.16). Pairwise analysis identified conditional complementarity between selected non-lipid biomarkers, particularly MMP-2 and EMMPRIN (pair {Delta}AUC {approx} 0.26; positive excess over single-feature effects), whereas lipid-related markers formed a highly correlated and largely redundant sub-cluster. Cumulative ablation demonstrated rapid performance collapse following removal of top-ranked biomarkers, consistent with structural signal concentration. Forward panel analysis showed that a compact subset of biomarkers (three features) achieved performance within ~0.01 ROC-AUC of the full model, indicating the presence of a minimal high-yield panel. Bootstrap confidence intervals suggested that small performance differences should be interpreted with caution. Conclusions: Predictive performance in this biomarker-based model arises from a structured and unevenly distributed signal architecture, characterized by a dominant core biomarker, conditionally complementary contributors, and a redundant lipid cluster. These findings highlight the importance of evaluating model structure, not only aggregate performance, and suggest that biomarker-based machine-learning systems may benefit from architecture-aware interpretation and simplification strategies.

8
Estimation of Physiological Metrics from Resting ECGs Using Deep Learning in the UK Biobank, Including submaximal exercise derived VO2max, Body Fat Percentage, and Grip Strength

Mankowski, I.; Pinter, E.; Lee, I.-M.; Raetsch, G.; Demler, O.

2026-05-13 cardiovascular medicine 10.64898/2026.05.09.26352818 medRxiv
Top 0.1%
10.0%
Show abstract

Maximal oxygen consumption [Formula] is the gold standard for cardiorespiratory fitness but requires resource-intensive physical testing. Recent reports show that machine learning models can extract additional information from ECGs, yet the potential of ECG as a source of physiological metrics remains underutilized. While routinely collected resting electrocardiograms (ECG) provide an opportunistic window into cardiorespiratory fitness, current deep learning models often struggle with cross-cohort transferability or remain dependent on active exercise data. We developed population specific models using the UK Biobank to estimate submaximal exercise derived [Formula](N = 8,540) and a panel of other physiological metrics (sample sizes up to N = 78,265) from resting 12-lead ECGs using Patient Contrastive Learning of Representations (PCLR), an AI based tool that converts ECG into a set of 320 features (ECG-PCLR). Data were split 80%:20% (training:test) and models were evaluated on a set-aside test subset. We demonstrate that ECG-PCLR embeddings alone can estimate submaximal [Formula] and body fat percentage with Pearson correlations (r) of 0.61 and 0.65, respectively. They also estimate systolic blood pressure, forced expiratory volume in 1 second (FEV1), and grip strength with r values from 0.31 to 0.55. Adding ECG embeddings to basic predictors (age, sex and BMI) improves submaximal [Formula] prediction by an absolute {Delta}R2 of 8% and by 1% to 13% for other physiologic parameters.

9
Integrated Right-Heart Remodeling Phenotypes and Prognosis in Tricuspid Regurgitation: An Automated Strain Echocardiography Study

Park, J.; Kwak, S.; Yoon, Y. E.; Park, J.-B.; Kim, J.; Jeon, J.; Jang, Y.; Lee, S.-A.; Bak, M.; Choi, H.-M.; Hwang, I.-C.; Lee, S.-P.; Kim, H.-K.; Kim, Y.-J.; Cho, G.-Y.

2026-06-01 cardiovascular medicine 10.64898/2026.05.28.26354377 medRxiv
Top 0.1%
9.8%
Show abstract

Background: Echocardiographic assessment of tricuspid regurgitation (TR) remains valve-centric, and right-heart remodeling is not captured. Strain parameters carry prognostic value but are evaluated in isolation. Objectives: To develop integrated right atrial (RA) and right ventricular (RV) remodeling indices using automated echocardiography and assess their utility for TR severity grading, phenotyping, and prognostic stratification. Methods: We analyzed 8,231 patients with functional TR (mild-or-greater) from two tertiary centers (2023-2024) using an automated AI-based echocardiographic solution. The RA remodeling index (RA reservoir strain/RA volume index) and RV remodeling index (RV free wall strain/RV end-diastolic area) were derived automatically; patients were classified into four RA-RV remodeling phenotypes. The primary outcome was all-cause death or heart failure (HF) hospitalization. Results: During median follow-up of 19.3 months, the primary outcome occurred in 574 patients (7.0%). Both indices outperformed individual components for severe TR discrimination (RA: AUC 0.857 vs. 0.757; RV: 0.710 vs. 0.601; both P<0.05). After multivariate adjustment, the RA (HR per unit decrease, 1.27; 95% CI, 1.09-1.49; P=0.002) and RV remodeling indices (2.32; 1.76-3.06; P<0.001) were independently associated with the primary outcome; on mutual adjustment, only the RV index retained significance and provided incremental prognostic value ({Delta}C-index +0.010; NRI +0.237; both P<0.05). The four phenotypes showed progressively divergent risk (log-rank P<0.001), with combined remodeling (Low RA/Low RV) carrying the highest risk. Conclusions: Automated integrated RA and RV remodeling indices improved TR severity discrimination and enabled clinically meaningful right-heart phenotyping. The RV index conferred incremental prognostic value, whereas the RA index better reflected atrial-stage remodeling and disease burden.

10
Early Prediction of Post-TAVR Left Ventricular Remodeling Using CT-Derived Radiomics and Clinical Variables

Rezaeitaleshmahalleh, M.; Masoumi, S.; Razaviamri, F.; Rouhollahi, A.; Zancanaro, E.; Danesi, T. H.; Ayers, B. C.; Jassar, A.; Sabe, A.; Nezami, F. R.

2026-06-02 cardiovascular medicine 10.64898/2026.05.28.26354361 medRxiv
Top 0.1%
8.9%
Show abstract

Background: Adverse left ventricular (LV) remodeling after transcatheter aortic valve replacement (TAVR) is associated with impaired functional recovery and adverse long-term outcomes, yet imaging-based risk stratification remains limited. Objectives: This study sought to determine whether CT-derived radiomic and geometric myocardial features, integrated with procedural and clinical variables, can predict adverse LV remodeling after TAVR. Methods: We retrospectively analyzed 232 consecutive TAVR recipients with paired pre- and post-procedural LV mass index (LVMI) measurements. Adverse remodeling was defined as a [&ge;]10% increase in LVMI at follow-up. Pre-procedural CT was used to derive three-dimensional LV geometric descriptors, ray-tracing wall-thickness metrics, and myocardial texture radiomic features. Random forest classifiers were developed across six models of sequentially increasing complexity. Results: Adverse LV remodeling occurred in 52 patients (22.4%). Geometry-only model showed limited discrimination (AUC 0.62), whereas wall-thickness radiomics substantially improved performance (AUC 0.84). A multimodal pre-procedural model combining CT radiomics with pre-procedural LVMI, residual valve insufficiency, and prior coronary revascularization achieved an AUC of 0.86 (95% CI 0.73 to 0.98). Addition of post-procedural mean transvalvular gradient further improved discrimination (AUC 0.91, 95% CI 0.81 to 0.98). SHAP analysis identified post-procedural mean aortic gradient and radiomic markers of myocardial heterogeneity as the leading predictors. Conclusions: CT-derived radiomic characterization of myocardial heterogeneity provides incremental prognostic information beyond conventional geometric assessment for identifying patients at risk of adverse LV remodeling after TAVR. These findings extend the role of pre-procedural CT beyond anatomical planning toward quantitative myocardial phenotyping and individualized risk stratification, although prospective validation is required to establish clinical utility.

11
Predicting the When: Multimodal AI for Time-to-Recurrence Analysis After Atrial Fibrillation Ablation

Yin, M.; lai, c.; Yadav, R.; Milstein, J. A.; Thi My Tran, L.; O'Donnell, C.; Schumacher, S.; Cronin, C.; Weinstein, R.; Yamamoto, C.; Ahmad, Z.; Chen, S.; Lefebvre, A.; Ryu, J.; Lacy, A.; Thi Yee, A.; Noh, J.; Kholmovski, E.; Maggioni, M.; Calkins, H.; Spragg, D.; Trayanova, N.

2026-05-15 cardiovascular medicine 10.64898/2026.05.12.26353055 medRxiv
Top 0.1%
7.3%
Show abstract

Background: Catheter ablation is the most effective rhythm control strategy for atrial fibrillation (AF); however, recurrence remains common. Current post-ablation management follows largely population-level protocols, constrained by the absence of tools that can anticipate not merely whether, but when, an individual patient will experience recurrence. The emergence of multimodal artificial intelligence (AI) presents a new opportunity to address this unmet clinical need. Objective: To develop a predictive model for time-to-AF-recurrence post-ablation using pre-procedural bi-atrial imaging, clinical covariates, and procedural characteristics, within a novel multimodal AI and survival analysis framework. Methods: We analyzed a retrospective cohort of 437 AF patients who underwent catheter ablation with follow-up censored at 36 months. MARTA-AF (Multimodal AI Recurrence and Time-to-event Analysis post-Ablation in AF) was trained on pre-procedural bi-atrial images, and covariates/procedural characteristics, and integrated into a survival model to generate time-varying recurrence probability estimates. Model interpretability was achieved by quantifying contribution of covariates/procedural characteristics to predicted survival probabilities. Results: MARTA-AF successfully predicted time-varying recurrence risk up to three years post-ablation. Patients were effectively stratified into low- and high-risk groups, with statistically significant discrimination sustained over the follow-up period. The model demonstrated consistent performance across clinically relevant subgroups, including sex, age, and AF type. Incorporation of right atrial shape features improved time-to-AF-recurrence prediction. Interpretability analyses identified key recurrence predictors. Conclusions: MARTA-AF delivers individualized, time-varying AF recurrence risk forecasts and enables stratification into clinically meaningful risk groups. This framework has the potential to transform post- ablation management into a proactive paradigm and to support informed clinical decision-making prior to ablation.

12
EXHEART: A Fairness-Aware Explainable Stacked Ensemble for Cardiovascular Disease Classification with Cross-Instrument Disparity Attribution

Biswas, M. A.; Laila, A.

2026-06-05 health informatics 10.64898/2026.06.03.26354879 medRxiv
Top 0.1%
7.0%
Show abstract

Background: Machine learning models trained on population health surveys offer scalable tools for cardiovascular screening, but recurring methodological weaknesses undermine their credibility and equity: data leakage from synthetic oversampling, qualitative rather than quantitative explainability evaluation, and the absence of demographic fairness auditing at the clinical operating threshold. Methods: We present EXHEART, a leakage-free stacked ensemble pipeline trained on BRFSS 2015 (n = 253,680) and validated on BRFSS 2020 (n = 319,795; temporal transport and retrain) and a clinical cardiovascular examination dataset (n = 68,730). The pipeline combines XGBoost, LightGBM, Random Forest, and a multi-layer perceptron as base learners with 5-fold out-of-fold logistic regression stacking and Platt scaling calibration. A quantitative SHAP-LIME consistency framework, based on Kendall-tau rank correlation and Jaccard overlap, accompanies a decision-curve analysis, a subgroup-stratified SHAP interaction analysis, and an intersectional fairness audit (Sex x Age x Income) with threshold-shifting mitigation and a frontier of the fairness-utility trade-off. The framework also adds cross-instrument fairness-disparity attribution, an empirical diagnostic that provides evidence on whether an observed subgroup disparity is more consistent with a measurement-induced or a substantive explanation by re-validating it on a dataset that measures the same clinical construct objectively. On heart disease, this diagnostic associates 89% of the sex TPR gap (95% CI [0.65, 0.99]) with the self-reported survey outcome rather than with a substantive risk difference. Results: On BRFSS 2015, EXHEART achieves AUC-ROC = 0.850, AUPRC = 0.371, Brier score = 0.071, and reduces ECE by 96% (0.256 to 0.011) via Platt scaling. Global SHAP-LIME rank agreement is moderate-to-strong (Kendall-tau = 0.580, Spearman-rho = 0.818) with a substantial top-3 divergence (Jaccard@3 = 0.200), where Stroke flips from SHAP rank 8 to LIME rank 1. The Sex TPR gap is 0.124 at the screening threshold; intersectional Sex x Age disparities reach 0.649 among adequately-powered cells, 5.2x the single-attribute gap. Temporal transport to BRFSS 2020 collapses sensitivity from 0.776 to 0.267, while retraining restores AUC = 0.840 and ECE = 0.012. On clinical examination data, the Sex TPR gap collapses to 0.014; the attribution test indicates this gap is instrument-dependent, consistent with a measurement or outcome-definition explanation rather than a substantive risk difference. Cross-domain SHAP analysis identifies four instrument-independent CVD risk factors and two major portability failures. Conclusions: EXHEART combines three practices that population-scale cardiovascular classifiers usually apply in isolation: leakage-free training with calibrated probabilities, a test of whether the model's explanations are stable, and a fairness audit that examines intersecting subgroups rather than single attributes. Bringing them together proved worthwhile. The intersectional audit revealed disparities that single-attribute auditing missed, and the cross-instrument comparison indicated that much of the sex gap reflects how the outcome is measured in survey data rather than a substantive difference in risk. The temporal transport findings indicate that deployed BRFSS models warrant periodic monitoring and retraining to maintain clinical utility. EXHEART is a retrospective methodological evaluation on public de-identified data; it is not validated for direct clinical decision-making, diagnosis, or treatment recommendation without prospective clinical validation.

13
Rationale and Design of an Artificial Intelligence Model for Diastolic Heart Failure (AID- HF): A Canadian Cardiomyopathy Collaborative (C3) Study

Papaz, T.; Patel, S.; Akilen, R.; Min, S.; Lesurf, R.; Rouleau, J.-L.; Ruiz, M.; Lam, C. Z.; Dragulescu, A.; Friedberg, M. K.; Mertens, L.; Tremblay-Gravel, M.; Krahn, A. D.; Tadros, R.; Mital, S.

2026-05-29 cardiovascular medicine 10.64898/2026.05.27.26354226 medRxiv
Top 0.1%
6.7%
Show abstract

Diastolic heart failure (HF) in primary cardiomyopathy is under-recognized and often diagnosed late, particularly in children. While recent studies have advanced understanding of HF with preserved ejection fraction in older adults, the prevalence, outcomes and molecular drivers of diastolic HF in pediatric and young adult cardiomyopathy remain poorly defined, where disease is typically driven by primary myocardial disease rather than acquired co-morbidities. The Canadian Cardiomyopathy Collaborative (C3) was assembled to leverage three of Canadas leading pediatric and adult cardiomyopathy biobank registries. Its flagship initiative, Artificial Intelligence to Model Diastolic Heart Failure (AID-HF), aims to integrate deep phenotyping - including comprehensive diastolic function assessment - with genomics, lipidomics and proteomics and apply machine learning to identify biological and clinical signatures that drive cardiac function and outcomes in cardiomyopathy. Harmonized phenotyping and multiomics protocols across registries will create a uniquely integrated national data resource and enable the goals of AID-HF i.e., earlier diagnosis and new therapeutic targets for diastolic HF in cardiomyopathy.

14
Baseline substrate and response after cardiac resynchronization therapy in non-left bundle branch block heart failure

Liang, Y.; Zhu, Y.; Wang, R.; Gu, R.; Sang, C.; Bao, Z.; Sun, L.; Xia, T.; Xiang, G.

2026-05-19 cardiovascular medicine 10.64898/2026.05.14.26353260 medRxiv
Top 0.1%
6.6%
Show abstract

Background: Response to cardiac resynchronization therapy (CRT) is heterogeneous in patients with non-left bundle branch block (non-LBBB) heart failure. Whether pre-implant substrate or procedural characteristics provide the more stable framework for predicting 1-year echocardiographic response remains uncertain. Methods: We retrospectively analyzed 120 non-LBBB patients undergoing CRT. The primary logistic model included left ventricular end-diastolic diameter (LVEDD), left ventricular ejection fraction (LVEF), left atrial diameter, log-transformed NT-proBNP, baseline QRS duration, fragmented QRS burden across V1?V6 leads, and pulmonary artery pressure. Missing predictor data were handled using multiple imputation with 20 datasets. Model performance was assessed using bootstrap internal validation and recalibration. A prespecified procedural extension added pacing strategy, posterolateral biventricular left ventricular lead location, left ventricular pacing threshold, and right ventricular lead position. Exploratory phenotyping and sensitivity analyses were performed. Results: Echocardiographic response occurred in 51 patients (42.5%). LVEDD (OR, 0.899 [95% CI, 0.826?0.978]; P=0.013) and LVEF (OR, 1.068 [95% CI, 1.000?1.140]; P=0.050) were the most informative predictors. The primary model showed apparent AUC 0.811 and Brier score 0.173, with optimism-corrected AUC 0.766 and calibration slope 0.765. Procedural extension showed no retained incremental value after validation. Exploratory phenotyping identified three response patterns with moderate stability. Conclusions: In non-LBBB CRT, baseline structural, biomarker, and electrocardiographic substrate provided the most stable framework for predicting 1-year echocardiographic response. Procedural variables added limited retained value, suggesting that pacing strategy should be interpreted alongside baseline substrate.

15
Redefining Non Invasive Post Transplant Surveillance: A Bayesian Meta Analysis and Decision Curve Framework for Donor Derived Cell Free DNA in Heart Transplantation

John, J. D.; Henna, F.; Waseem, F.; Hassan, M. A.; Bacha, Z.; Mukhlis, M.; Mohammed, B. K.; Cheema, S.; Shah, K.

2026-05-22 cardiovascular medicine 10.64898/2026.05.15.26353184 medRxiv
Top 0.1%
6.4%
Show abstract

Donor derived cell free DNA (ddcfDNA) is increasingly used for post transplantation non invasive surveillance; however, its clinical interpretation remains inconsistent, with widely ranging thresholds and is typically applied as a single binary cutoff in literature. The optimal decision framework for rule out and rule in decisions, and whether a single threshold remains clinically meaningful, are currently uncertain. We performed a Bayesian hierarchical summary receiver operating characteristic (HSROC) meta analysis of 14 studies (1,763 patients) evaluating ddcfDNA against endomyocardial biopsy. To account for serial testing within individuals, we applied a cluster corrected design effect, reducing 6,103 observations to 2,518 effective tests. Threshold dependent sensitivity and specificity were modelled continuously. We compared a conventional single threshold approach (Youden index) with a data driven adaptive framework defining rule out and rule in thresholds. Clinical utility was evaluated using decision curve analysis across a range of rejection prevalences (10% to 30%), incorporating repeat testing strategies. The pooled area under the HSROC curve was 0.78 (95% CrI, 0.67 to 0.84). The Youden optimal threshold (0.20%) yielded balanced sensitivity (0.77) and specificity (0.77) but failed to support clinical objectives of diagnosis. An adaptive framework identified a rule out threshold of 0.16% (sensitivity 0.80) and a rule in threshold of 0.48% (specificity 0.90), defining a indeterminate / grey zone. Across all prevalence scenarios, ddcfDNA guided strategies provided positive net benefit compared with biopsy all and biopsy none approaches. A repeat if borderline strategy consistently achieved the highest net benefit, particularly in low and intermediate risk settings, by reducing false positive biopsies without materially compromising detection. A single threshold interpretation is not clinically adequate for post heart transplant surveillance. Our tri state, prevalence aware framework integrating rule out, indeterminate, and rule in zones with selective repeat testing, more accurately reflects biomarker behavior and improves clinical decision making. These findings support a shift away from binary thresholds toward dynamic, context dependent use of ddcfDNA in transplant surveillance.

16
Predicting 30-Day Heart Failure Readmissions Using Machine Learning: Insights From the Kansas Health Information Network (KHIN)

Kim, M.; Yan, J.; Wasfy, J. H.; Aseltine, R.

2026-05-21 cardiovascular medicine 10.64898/2026.05.18.26353537 medRxiv
Top 0.1%
4.8%
Show abstract

Background: Heart failure (HF) is a major contributor to inpatient hospital utilization, with persistently high 30-day readmission rates. Existing prediction tools are frequently restricted to primary-diagnosis HF admissions, potentially excluding clinically relevant HF-related hospitalizations. Objectives: To develop and validate risk prediction models using machine learning (ML)-based risk prediction models to predict 30-day readmissions among patients with HF using the Kansas Health Information Network, a statewide health information exchange. Methods: This retrospective cohort study analyzed HF hospitalizations using predictors including demographics, comorbidities, laboratory results, medications, clinical quality metrics for diabetes and kidney disease management, and prior healthcare utilization. Five ML models, including regularized logistic regression, random forest, extreme gradient boosting, categorical boosting, and deep neural network, were trained using stratified 5-fold cross-validation. Model performance was evaluated on an independent test set using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), misclassification rate (MCR), and Brier score. Results: Among 2,734 HF patients, the 30-day readmission rate was 27%. The XGBoost model achieved the best discrimination (AUROC=0.75; AUPRC=0.58; MCR=0.21). Patients in the highest-risk decile had a positive predictive value of 76%, accounted for approximately one-third of all 30-day readmissions, and had a 3.3-fold enrichment compared with baseline risk. The key predictors included prior hospital utilization, diabetes and kidney disease management indicators, and comorbidity burden. Conclusions: Risk stratification using routinely collected clinical data identified a subgroup at elevated risk for 30-day readmission. These findings support the potential role of data-driven risk prediction to inform targeted transitional care.

17
An AI-assisted feasibility evaluation of three photoplethysmography-derived microvascular reactivity signals in MIMIC-IV-WDB v0.1.0

Landry, T. C.; Kim, Y.

2026-06-06 health informatics 10.64898/2026.06.03.26354863 medRxiv
Top 0.1%
4.3%
Show abstract

Background. Capillary refill time, an examiner-dependent bedside test of distal microvascular perfusion, has become a resuscitation target in septic shock,1,2,3,4 motivating a continuous surrogate computed from the photoplethysmogram (PPG, the optical waveform the pulse oximeter on every ICU patient already records).5,6,7,8 Objective. We attempted three PPG-derived candidate measures on the MIMIC-IV Waveform Database (MIMIC-IV-WDB v0.1.0) and asked, by inspecting randomly drawn examples, whether each captured its intended physiology before any downstream modeling. Methods. MIMIC-IV-WDB v0.1.09 was linked to MIMIC-IV.10 The signals were a cuff-anchored perfusion-index recovery (reactive hyperemia when the cuff shares an arm with the probe), a slow Mayer-wave-band power ratio of the perfusion index (sympathetic vasomotor tone), and a per-beat diastolic exponential decay time constant (a refill-like recovery time). For each signal we drew 10 random examples at a fixed seed and checked them against a checklist fixed in advance. Each was read by the author and, separately, by MedGemma 1.5, a multimodal medical language model run locally. A synthetic test with a known time constant checked the third signal. Results. The cuff-anchored signal showed the expected occlusion-reperfusion shape on 268 of 6,236 evaluable cuff cycles (4.30%) in 15 of 19 patients, consistent with opposite-limb placement of the probe and cuff. The slow-band ratio returned a stable cohort value, but a clear, stationary peak appeared in only4 of 10 random windows. The per-beat fit met its goodness-of-fit threshold in 10 of 10 beats, yet a cardiac-frequency heuristic flagged a possible fit on the heart-rate oscillation in 7 of 10, and in 5 of 17 patients the time constant lay where an exponential is indistinguishable from a straight line. A 0.5Hz high-pass pre-filter implanted its own approximately 318 ms time constant regardless of truth. The language model tracked the human on clear positives but reported the pattern present on every call it returned, never absent. Conclusions. Two of the three candidate signals did not reflect their intended physiology in most examples, and the third was constrained by sensor placement. Inspecting a few random raw inputs against a checklist written in advance is an inexpensive upstream check before downstream inference on PPG-derived microvascular signals.

18
CarotidMamba: Foundation Model-Enabled CTA Phenotyping of Symptomatic Carotid Plaques in a Multi-Center Retrospective Study

Liu, Y.-S.; Dou, X.-W.; Zheng, P.-Y.; Feng, W.; Ma, L.-J.; You, Y.-N.; Shao, G.-W.; Shen, J.-G.; Yu, X.; Qiao, C.; Cheng, Z.-W.; Li, Z.-W.; Su, F.; Zhang, B.-W.; Qu, X.-H.; Jiang, g.

2026-06-05 cardiovascular medicine 10.64898/2026.06.02.26354776 medRxiv
Top 0.1%
4.3%
Show abstract

Background: Treatment decisions for carotid atherosclerotic disease rely primarily on luminal stenosis, although plaque vulnerability and symptomatic status better reflect short-term cerebrovascular risk. A scalable CTA tool for automated phenotyping of symptomatic carotid disease is lacking. Materials & Methods: In this multi-institutional retrospective study, 689 patients (mean age, 67.9 {+/-} 7.7 years; 366 men) from four hospitals were analyzed after screening 705 CTA examinations. 423 patients from one center were used for five-fold development and internal validation, and 266 patients from three centers for independent external validation. CarotidMamba, a deep learning framework combining dual foundation-model encoders with Mamba-based sequence modeling, was developed and benchmarked against clinical, radiomics, clinic-radiomics, CNN, and transformer comparators. Results: In the development cohort, CarotidMamba achieved an AUC of 0.839 (95% CI, 0.799-0.879) and accuracy of 0.825 (95% CI, 0.793-0.857), outperforming the strongest comparator by 0.066 and 0.050, respectively. External validation yielded AUCs of 0.897 (95% CI, 0.835-0.959) in YCH, 0.809 (95% CI, 0.720-0.898) in DCH, and 0.762 (95% CI, 0.649-0.875) in GH-NTC. CarotidMamba showed the lowest Brier score and expected calibration error across cohorts, with calibration slopes near 1.0. Conclusion: CarotidMamba provides an interpretable, clinically oriented, and externally validated CTA framework for phenotyping symptomatic carotid plaques, supporting vulnerability-aware imaging assessment beyond stenosis alone.

19
Left Atrial Stiffness Trajectories Identify Distinct Prognostic Phenotypes in Heart Failure with Reduced Ejection Fraction

Sun, J.; Park, J.; Bae, N. Y.; Lim, J.; Kwak, S.; Bak, M.; Choi, H.-M.; Park, J.-B.; Yoon, Y. E.; Lee, S. P.; Kim, Y.-J.; Cho, G.-Y.; Kim, H. K.; Hwang, I.-C.

2026-05-22 cardiovascular medicine 10.64898/2026.05.20.26353741 medRxiv
Top 0.1%
4.3%
Show abstract

Background: Treatment response in heart failure with reduced ejection fraction (HFrEF) is assessed predominantly through left ventricular (LV) functional recovery, while longitudinal changes in left atrial (LA) hemodynamic burden remain underexplored. The LA stiffness index (LASI), derived from E/e' and LA reservoir strain, integrates LV filling pressure and LA compliance. Objectives: We investigated longitudinal trajectories of LASI and their prognostic implications in HFrEF treated with angiotensin receptor-neprilysin inhibitor (ARNI)-based therapy. Methods: From the multicenter STRATS-HF-ARNI registry, 1,039 patients with HFrEF who underwent serial echocardiography at baseline and one-year follow-up were classified into four LASI trajectory patterns dichotomized at the cohort median (1.22): persistently compliant (Group A, 46.8%), reverse remodeling (B, 28.5%), progressive stiffening (C, 3.2%), and persistently stiff (D, 21.6%). Results: On multivariable Cox regression, Group D was independently associated with elevated risks of all-cause mortality (adjusted hazard ratio [aHR] 2.68, 95% CI 1.57-4.59), cardiovascular mortality (aHR 4.36, 1.97-9.64), and HF hospitalization (aHR 3.83, 2.22-6.60), whereas Group B showed outcomes comparable to Group A. One-year LASI progression independently predicted all three outcomes. LASI elevation at one year predicted adverse outcomes even among patients with recovered LV function, and LASI trajectory classification provided incremental prognostic discrimination beyond conventional diastolic and strain parameters. Among sinus-rhythm patients (n=786), Group C exhibited the highest risk of new-onset atrial fibrillation. Conclusions: In HFrEF treated with ARNI-based therapy, LASI trajectories identify distinct prognostic phenotypes. Persistent LA stiffness confers adverse outcomes independent of LV recovery, and serial LASI assessment may enhance risk stratification beyond LV-centric metrics.

20
ECG-derived age deviation predicts cardiovascular diseases across lead configurations and cohorts

Aydogdu, D.; Gaber, F.; Sorooshmehr, A.; Akalin, A.

2026-06-08 cardiovascular medicine 10.64898/2026.06.05.26354974 medRxiv
Top 0.1%
4.3%
Show abstract

Cardiovascular diseases (CVDs) remain the primary global health burden, motivating the search for robust, non-invasive risk biomarkers. We harness a foundation model pretrained on over 10 million recordings, to evaluate ECG-derived age deviation as a cross-cohort biomarker of CVD burden. A predictive model, trained exclusively on healthy subjects, achieved accurate age prediction. Diseased subjects exhibited significant positive age acceleration across multiple categories, with structural and ischemic heart diseases showing the largest effects. External validation in a hospital-based cohort (n=160,493) confirmed that age acceleration independently predicts all-cause mortality, with the strongest prognostic value in patients under 65 years. Furthermore, we demonstrated that disease discrimination and mortality prediction are preserved across 6-lead and single-lead configurations, supporting potential deployment in wearable or mobile devices. Our analysis also revealed a striking morphological confound from the complete left bundle branch block, leading us to propose absolute age deviation as a more robust, universal risk marker. These findings establish ECG-derived biological age deviation as a highly generalizable and clinically actionable biomarker for assessing cardiovascular risk. We have also developed a web application at https://bioinformatics.mdc-berlin.de/ECGage that allows users to easily test our framework.